Hadoop--eclipse写MapReduce代码在Hadoop上执行单词统计

一、需要的jar包

Hadoop-2.4.1\share\hadoop\hdfs\hadoop-hdfs-2.4.1.jar
hadoop-2.4.1\share\hadoop\hdfs\lib\所有jar包

hadoop-2.4.1\share\hadoop\common\hadoop-common-2.4.1.jar
hadoop-2.4.1\share\hadoop\common\lib\所有jar包

hadoop-2.4.1\share\hadoop\mapreduce\除hadoop-mapreduce-examples-2.4.1.jar之外的jar包
hadoop-2.4.1\share\hadoop\mapreduce\lib\所有jar包

二、代码

mapper类

package kgc.mapred;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    static IntWritable one = new IntWritable(1);
    static Text word =new Text("");

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer words =new StringTokenizer(value.toString());
        //String[] words = value.toString().split();
        while(words.hasMoreTokens()) {
            word.set(words.nextToken());
            context.write(word, one);
        }
    }
}

reduce类

package kgc.mapred;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    protected void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int count = 0;
        for (IntWritable num : values) {
            count = count + num.get();
        }
        context.write(key, new IntWritable(count));
    }
}

提交main类

package kgc.mapred;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount 
{
    public static void main( String[] args ) throws Exception
    {
        //Hadoop配置
        Configuration cfg = new Configuration();
        //作业
        Job job = Job.getInstance(cfg, "WordCountMR" );

        //设置包含Mapper和Reducer定义的jar
        job.setJar("wordcount-0.0.1.jar");
        //job.setJarByClass(WordCount.class);

        //map任务处理类
        job.setMapperClass(WordCountMapper.class);
        //reduce任务处理类
        job.setReducerClass(WordCountReducer.class);

        //初始输入格式
        job.setInputFormatClass(TextInputFormat.class);
        // \r return  \n newline
        //初始输入文件
        FileInputFormat.addInputPath(job,  new Path(args[0]));

        //最终输出格式
        job.setOutputFormatClass(TextOutputFormat.class);
        //最终输出路径
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        //map、reduce统一输出类型.
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //作业执行
        job.waitForCompletion(true);

    }
}

三、上传jar包在Hadoop中运行

  1.如果你是用maven的project,那么可以直接在run as中的maven install生成相应的jar包。其余步骤相同。

 2.如果是普通的java项目,在File-->Export-->Runnable jar file-->保存路径-->单选第一个按钮-->生成jar包

利用xshell将jar文件上传到虚拟机即可。

3.上传要计算的文件,同样用xshell上传到虚拟机后,利用如下命令放在HDFS下:

hadoop fs -put /文件当前路径  /放在HDFS里的路径

4.运行jar文件并计算文件内容:命令如下:

yarn jar jar文件 /需要计算的文件  /文件输出的路径(确保文件不存在)

5.接下来会有job任务显示,并对进度进行提示,最后给出完成报告。

 

当然,不仅在控制台可以看到,在Hadoop的后台一样可以看到。只需在网页上输入 IP:8080,即可查看计算进度,或IP:50070都可以查看文件生成情况。

 

posted @ 2018-07-25 16:11  潜水闲鱼  阅读(807)  评论(0编辑  收藏  举报